DanPO-a transcription-based dictionary for Danish speech technology
نویسندگان
چکیده
We present a new strategy for the creation of phonetic lexicons. As we argue, lexical resources for speech technology integration should be informed by transcriptions of spontaneous speech. We illustrate our strategy with examples from the dictionary DanPO (Danish Phonetic-Orthographic Dictionary) which is developed at the Center for Computational Modelling of Language (CMOL). For reference corpus we used DanPASS consisting of 57 recordings of task-oriented monologs, transcribed by professional and MAlevel phoneticians using the Danish SAMPA phonetic alphabet. From the transcriptions, dictionaries and concordances were compiled, and these resources were merged with the (prescriptive) phonetic renderings of a standard Danish word dictionary of 87,000 lemmata. As an effect of the “transcription informed” strategy, DanPO is expected to significantly improve the success rate of automatic speech recognizers, as well as the naturalness of artificial voices. Furthermore, we devise an experimental strategy in order to evaluate the dictionary and further improve later versions.
منابع مشابه
Speech Enhancement using Adaptive Data-Based Dictionary Learning
In this paper, a speech enhancement method based on sparse representation of data frames has been presented. Speech enhancement is one of the most applicable areas in different signal processing fields. The objective of a speech enhancement system is improvement of either intelligibility or quality of the speech signals. This process is carried out using the speech signal processing techniques ...
متن کاملA New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain
Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...
متن کاملDeriving a bi-lingual dictionary from raw transcription data
We present a bigram-based method for deriving bi-lingual dictionary entries from two corpora of spontaneous speech (as represented in transcriptions). In contrast to e.g. [1], our method does not require translated or otherwise aligned texts; the corpora representing the source and target languages may be unrelated wrt. size, vocabulary richness, frequency distribution, and activity type. Examp...
متن کاملAnalysis of phonetic transcriptions for Danish automatic speech recognition
Automatic speech recognition (ASR) relies on three resources: audio, orthographic transcriptions and a pronunciation dictionary. The dictionary or lexicon maps orthographic words to sequences of phones or phonemes that represent the pronunciation of the corresponding word. The quality of a speech recognition system depends heavily on the dictionary and the transcriptions therein. This paper pre...
متن کاملArabic broadcast news transcription system
This paper describes the development of an Arabic broadcast news transcription system. The presented system is a speaker-independent large vocabulary natural Arabic speech recognition system, and it is intended to be a test bed for further research into the open ended problem of achieving natural language man-machine conversation. The system addresses a number of challenging issues pertaining t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005